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1. INTRODUCTION 



The purpose of this Technical Memorandum is to illustrate the capabilities 
of the parallel organized SOLOMON II Computer. Particular attention will be 
paid to the application of the SOLOMON II System to various aspects of the 
numerical weather forecasting problem. It is believed that the best possible 
method of demonstrating these capabilities is to illustrate a simple primitive 
equation model in detail plus other related problems which, taken together, 
contain types of operations typically encountered in numerical weather fore- 
casting. In addition, a detailed coding of the primitive equation mode is pre- 
sented to demonstrate the principles of SOLOMON II programming and to ob- 
tain a running time estimate for this model. 

For those not familiar with SOLOMON II programming, it is suggested that 
the "SOLOMON II Programmers Reference Manual" be read concurrently, in 
order for the reader to obtain a better understanding of the SOLOMON II pro- 
grams presented in this report. Also, for those interested in the background 
material on the numerical weather forecasting topics presented, it is recom- 
mended that the appropriate references, listed at the end of this report, be 
reviewed. 

Also included in this report are brief descriptions of general physical char- 
acteristics and applicable subroutines associated with the SOLOMON II Com- 
puter. For more detailed information on the technical and physical aspects 
of the SOLOMON II System, reference should be made to SOLOMON Project 
Technical Memorandum No. 25, "SOLOMON II Physical Characteristics." 
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2. CONCEPT OF WRITING ONE PROGRAM TO CONTROL 

THE NETWORK AND NCU 



Although the Network Control Unit (NCU) and the Processing Element (PE) 
Network appear to be two separate units (see figure 2-1), each with its own 
set of instructions, a program interspersing commands for both units will 
cause concurrent operation of the two. 



Figure 2-1. Network Control Unit and Processing Element Network 

The NCU monitors all instructions. An instruction is brought from Central 
Memory (CM) into the NCU. If it is a PE instruction, it is sent to the Network 
for execution and the next instruction is brought from CM. By interspersing 
NCU commands and Network commands, the two units perform simultaneously. 

Suppose, for example, a program contains a PE multiply instruction 
(execution time is 13 microseconds). The program can be written so that 
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about 6 NCU commands can be executed during this time. Thus in some cases 
most of the PE bookkeeping (i.e. , indexing, shifting, etc) can be done without 
loss of time . 

A program written with interspersed commands will enable SOLOMON II 
to obtain the maximum in concurrent operation. 

Also in line with the above, the operations of both the NCU and the PE Net- 
work can be executed by the same program. This is due to the fact that all 
instructions are extracted from NCU memory and those pertaining to the PE 
Network are given to the Network to execute via the NCU, while those pertain- 
ing to the NCU are executed by the NCU. Thus one program is capable of 
controlling both the NCU and the PE Network either separately or together 
(concurrently), depending upon how the two types of instructions are inter- 
spersed. It should be mentioned that the main computing power is in the Net- 
work while the NCU only performs operations such as indexing, bookkeeping, 
and data transfer. 
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3. THE USE OF MODE CONTROL 



Mode control is one of two methods provided for referencing one or more 
groups of PE's. At any point in the program any PE may be set to one of four 
modes (0, 1, 2, 3). Once a PE is set to a mode it has the option of respond- 
ing or not responding to a given instruction according to the PE mode and the 
mode control indicated in the instruction (i.e. , PE's in Mode 1 respond to in- 
structions addressed to Ml but do not respond to instructions addressed to 
MO, 2, 3). Thus, the mode status provides the programmer a means of con- 
trol of special situations in a program. 

Assume, for example, a weather problem where each PE represents a 
given grid point in the system. Assume also that the PE arrangement is a 
fixed square > 2 by 2 (this is purely an assumption, however, as the arrange- 
ment may be any configuration). Mode control will provide an effective method 
of differentiating between boundary points and internal mesh points in the 

square arrangement. 

2 

The equation V $ = F(x,y) will serve well as an example of mode control. 
An averaged <j) must be found for both poles. (Assume we are working with 
the north pole. All points have a value for but only those adjacent to poles 
are to be averaged.) In a matrix array of PE's the uppermost row is assumed 
to be the north -most row. Mode control provides a method so that only those 
PE's concerned with the north row will be averaged and ensures that the inter- 
nal mesh points will not respond to the averaging instructions. This method 
is as follows: 

a. Set all PE's to Mode 

b. Set the uppermost or north row to Mode 2 

c. Instructions for the averaging routine will reference those PE's in 
Mode 2 only. (This same routine will be repeated for the south pole $.) 
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Therefore, in the above example, mode control provides boundary control. 
The relaxation routine in a weather problem also depends on mode control 

for its completion. For each grid point there is an original assumption for 

$ ij m 
\\) =| . This [\) is then substituted and the resulting R .. is tested against 

m ^ 
a given e. If e - R. . < 0, convergence has not occurred. A new value of 

ij 

ijj is computed and the loop is recycled; i.e. , the new value of iji is substituted 
and convergence is again tested. 

The PE's are originally set to Mode 0. At the end of the first cycle, any 
PE which has not converged is set to Mode 2. This is accomplished by check- 
ing the sign bit; i.e., any PE with a negative result is set to Mode 2. The 
mode status is then tested. If any PE is in Mode 2, all PE's are again set to 
Mode and the loop is recycled. Only when all PE's remain in Mode at the 
end of a cycle has convergence occurred. Thus mode control allows the 
termination of a loop. 

The above two examples for the use of mode control (boundary control and 
loop control) are a small sample of its many uses. 

The availability of only four mode states may seem a restriction on the 
programmer but the ability to store and load mode states (mode map) into 
and from the PE core memory provides the programmer with unlimited (actu- 
ally limited only by the amount of core storage available) mode states. The 
PE instruction set contains the necessary instructions for mode control. In 
addition, the programmer can change the mode state of PE's based on the 
result of an arithmetic operation, overflow conditions, bit checks, etc. Thus 
mode control provides the programmer with a powerful method of control. 
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4. GEOMETRIC CONTROL. 



A second method of idling some PE's is to use two registers which exert 
what is called geometric control. One register, called the row select regis- 
ter, is associated with the rows of the square array of PE's and the other, 
similarly associated with the columns, is called the column geometric regis- 
ter. The row geometric register contains exactly one bit corresponding to 
each row of PE's. The column geometric register contains exactly one bit 
corresponding to each column of PE's. The contents of the geometric control 
portion of the Machine Option register determines which of the four geometric 
possibilities is to be effective during an instruction: 

a. Neither register is active 

b. The row geometric register is active and the column geometric 
register is inactive. 

c. The column geometric register is active and the row geometric 
register is inactive. 

d. Both registers are active. 

When either register is active, its effect is as follows. If the k-th bit in the 
row (column) geometric register is a 0, then the PE's in the k-th row (col- 
umn) are all idle. If this bit is a 1, then these PE's are active, though still 
subject, of course, to mode control. When either of the registers is inactive, 
it does not idle any of the PE network; i.e. , an inactive register is equivalent 
to a register filled entirely with l's. In the 4 by 4 sample network shown in 
figure 4-1 below, the PE's behave as follows as a result of geometric con- 
trol. In case a listed above, all 16 PE's are active. In case b, all 12 PE's 
which are shaded with horizontal or with both horizontal and vertical lines 
are active. In case c, the eight PE's shaded with vertical or with vertical 
and horizontal lines are active. In case d, the six PE's which are shaded 
with both horizontal and vertical lines are the only ones active. 
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Figure 4-1. 4x4 Sample Network 



Geometric control has many uses. The following examples will demon- 
strate some of these: 

a. Establish Boundary Conditions (see figure 4-2) 

Presume all PE's are in Mode 0. The GR would then be loaded with con- 
stants (k where k = Oil). An instruction will then set all activated PE's to 
Mode 3, thus establishing the boundary conditions with all boundary PE's in 
Mode (see figure 4-3). Actually for this simple case the programmer can 
specify that geometric control is active (both) and the mode state of the bound 
ary PE's are immaterial. 

b. Input of Data 

As a continuation of a, suppose data is to be sent to boundary points only. 
Using the L-Buffer, where data are stored in locations 0, 1, ... 31, an 
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Figure 4-2. Boundary Conditions 

instruction will transfer the contents of the L-Buffer to corresponding active 
PE's (as designated by l's in CG and RG). Thus data is brought into the 
Network under geometric control, 
c. Output of Data 

If only certain data is to be output, geometric control is employed to 
transfer the data from the PE Network to the L-Buffer (see figure 4-4). 

Suppose the PE containing the desired information is in Mode 3. Any 
instruction will command any column with active PE's (i.e. , with a PE in 
Mode 3) to place a 1 in the corresponding bit position of the CG. The next 
instruction, with the combined use of geometric and mode control, will trans- 
fer data from the Mode 3 PE in column 1 to the L-Buffer for output. In this 
example geometric control provides output control. 
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Figure 4-3. Boundary PE's in Mode 
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Figure 4-4. Transfer of Data from PE Network to L-Buffer 
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5. USE OF THE BROADCAST REGISTER 



One of the features of SOLOMON is the Broadcast Register (B); a 20 -bit 
register which transfers data from Central Memory (CM) to the PE Network. 

Suppose a problem requires constants for its solution. These constants 
are stored in CM. One instruction transfers the data from CM to B thus 
making the constant available to all PE's. After the constant is in B, it may 
be used for various arithmetric operations in addition to being broadcast to 
all PE's. By using B the PE memory is preserved (one memory location as 
compared to one per PE). 

The register is also adaptable (as an option) for special usage. For exam- 
ple, problems using polar boundaries require values to the north or south of 
the base PE. These values come from neighboring PE's for internal points. 
The acquisition of these values for the polar parameters, however, poses 
special problems. An easy solution is to use B as a north or south routing 
(i.e. , provide north and south averaged values) thus permitting polar PE's 
to gain needed information. 

Thus two uses of B are: 

a. Providing constants for use in all PE's 

b. Providing north or south routings for special problems such as 
supplying polar values when needed. 
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6. INTERCONNECTIONS BETWEEN PE'S 



An important feature of the PE network is that neighboring PE's are 
connected so that each PE has available, as well as the data in its own mem- 
ory, the data in the memory of its four nearest neighbors. 

Ordinarily, the PE's are thought of as lying in a square array in a plane 
(see figure 6-1). The basic SOLOMON network is considered to have an 
array of 32 by 32 PE's. Without any design changes, however, modules of 
256 (3 2 x 8) PE's can be added to or removed from the network. Thus the 
minimal network contains 3 2 by 8 PE's. For a PE which is interior to the 
array, the four nearest neighbors are designated as east (E), north (N) , west 
(W), and south (S) (see figure 6-2). The PE's along the edges have only three 
nearest neighbors, while those on the corners have only two. 
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Figure 6-1. PE's in Square Array in a Plane 
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Figure 6-2. Four Nearest Neighbors 
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7. INDEX REGISTERS 



The SOLOMON II System has seven index registers. The two main uses 
of the index registers are to provide loop control and to provide a means of 
picking an element out of an array. 

The following example illustrates the use of an index register (in this case 
XI) to provide loop control. Suppose there is a series of N instructions that 
must be performed Z times. Then the following loop could be constructed: 
TIX XI, Z (Set XI to Z) 

Begin Inst. 1 

Inst. 2 



Inst. N 

SIX XI, -1 Subtract 1 from XI 

JXZ ^, XI, Begin (if XI ^ 0, jump to Begin) 

The loop is performed Z times. When the Z-th time is completed, XI = 
and the next instruction following this loop will be executed. 

Suppose that there exists a one dimensional array of n elements stored 
sequentially as: 

address ELEMENT 

address + 1 ELEMENT 1 

address + 2 ELEMENT 2 



address + n-1 ELEMENT n-1 
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Suppose it is now desirable to perform an operation on the Z-th element of 
this array (0 < Z < n). Also suppose that the desired operation is a TMP. 
This could be performed as follows: 

TIX X, Z-l Set index register X to Z-l 

TMP ELEMENT, X 

The address of ELEMENT is added to the contents of X and the element in 
this new address is transferred to the P register. This element is indeed the 
desired Z-th element of the array. 

Since the SOLOMON II System has seven index registers, it is possible to 
have control over many loops and arrays with a minimum of manipulation. 
As a final comment on this topic, any one of the seven available index regis- 
ters can be used to control loops in either the PE Network or in the Central 
Control Unit. Hence the programmer has available seven index registers 
that can be used to index PE instructions or NCU instructions. 
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8. DOUBLE PRECISION ARITHMETIC 



8. 1 DOUBLE PRECISION MULTIPLY 

This program was written for application to problems which require great- 
er than average precision. A double precision number (2 + 38 bits) is rep- 
resented by 2 (1 + 19) bit numbers. A restriction on the location of the most 
significant and least significant 20 bits is not necessary but in general they 
will be located in sequential memory locations. A double precision number 
can be expressed as the sum of the most significant 19 bits plus the least 
significant 19 bits as A + AA. The product of two such numbers yields the 
identity: 

(A + AA) (B + AB) = AB + AAB + AAB + AAAB (8-1) 

19 

In fixed point form, (A + AA) can be expressed as (A* 2 + AA*2°). Thus 

the right side of equation 8-1 can be expressed as in equation 8-2 

38 1 9 19 

AB-2 + AAB- 2 + AAB- 2 + AAAB- 2° (8-2) 

Note from equation 8-2 that the sign positions of the numbers will line up 
as shown in figure 8-1. 

One obtains the 80 -bit product by adding the appropriate single length 
blocks of 20 bits. Care must be taken to preserve the carry bit from the 
addition of each pair of blocks. The carry bits are added into the next col- 
umn of blocks. The use carry (UC) option on the addition operation adds in 
the carry flip-flop. In general, it is impossible to get overflow in the multiply 
operation. However, since the 2's compliment representation is used for 
negative numbers it is possible to have (-1) x (-1) = +1 which causes an over- 
flow. This is the only case. It is not necessary to sense overflow in the 
addition operation except in the final column. If overflow occurs in the 
second or third column, only the sign bit will be wrong. But these are changed 
to match the sign of column I by the algebraic shift of places at the end of 
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Figure 8-1. Sign Position of Numbers 



the program. If overflow occurs in column 1, a jump instruction transfers 
the computer out of the program to an instruction called OUT. (OUT must 
be designated in the main program.) 

The resultant is stored in four locations called Rl, R2, R3 , and R4. Rl 
is the most significant 19 bits and R4 is the least significant. Note that the 
same digit occupies all four sign bits located within the 80-bit result. 
8. 2 DOUBLE PRECISION ADD 

This program was written to accompany the double precision multiply 
program. A double precision number is expressed as (A + A A) where A is 
the most significant 19 bits and A A is the least significant. Since the 
sign bits of two double precision numbers line up, the numbers can be added 
in blocks of 19 bits (see figure 8-2). 

The carry from column 2 is added to column 1 by the UC option on addition 
The necessity of detecting overflow is left to the programmer. The most 
significant 19 bits of the result are stored in the P Register and the least 



8-2 



15090A 



Computer and Data Systems \Y2 



(I) (2) 



A 


AA 




B 


Ab 



SPN I5090-VA-35 



Figure 8-2. Sign Bits of Two Double Precision Numbers Line Up 

significant are stored in the Q Register. The double precision add time is 
11 microseconds. 
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9. input/output 



9. 1 GENERAL 

In general, the I/O operations in the SOLOMON II System are quite conven- 
tional in that data channels are used to transmit data between central memory 
and the standard peripheral devices such as magnetic tape, disk, high speed 
printer, and typewriter. However, the transmission of data between central 
memory and the PE is rather unique. This transmission of data is accom- 
plished by a serial-to-parallel converter called the L-Buffer. The actual 
details of transferring data is covered in SOLOMON Technical Memorandum 
No. 25, "SOLOMON II Physical Characteristics. " 

To store information into the PE memories, the information must first 
be read into central memory by tape, punched cards, etc. From the central 
memory, it is transferred to the L-Buffer and from there to the PE memo- 
ries. 

The output from the PE Network process is the exact reverse procedure. 
The desired information is transferred from the PE memories to the L- 
Buffer and from there to the printer or other peripheral devices. 
9.2 CONTOUR LINES 

The final output for many forecasting models are contour lines of weather 
parameters such as pressure and temperature. For example, at the end of 
the 500-mb forecast, the desired output consists of the 500-mb height field. 
Assume that it is desired to obtain the contour for 500-mb surface at 19, 000 ± 
±50 feet. The following series of instructions can be used: 



MSI 



Set all PE's to Mode 



TMB 



Load Broadcast Register with 19,000 
Load P with height 



TRP 
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SR 



B 



(P)-(B)-P 



MS LA 



TMB 



Load Broadcast Register with 50 

Set all PE's in which fPl < B to Mode 2 



Each PE has two flip-flops to store the mode status and for this case any PE 
which had answered yes would have a (lO)^ combination in its mode flip- 
flops,, The Mode 2 PE's will define the contour line and this line can be dis- 
played on a TV tube by the use of a display device. The same procedure can 
be used to obtain the remaining contour lines of the height field and then a 
picture of the TV tube can be taken for the final output. This process of dis- 
playing contour lines could be a continuous process and would allow the Mete- 
orologist to observe the progress of the forecast as it is being generated. 
This procedure would seem to be a powerful tool for a Meteorologist working 
to experiment with different models, different finite differencing approxima- 
tions, or the general heating of the atmosphere. 

The contour output scheme should also provide a speed advantage over the 
present method of feeding the data to an analog plotter. However, no time 
estimate is available at this time. 
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10. VARIABLE GEOMETRY 



Three alternative configurations, described below, are available in addi- 
tion to the planar one described previously. The choice of the 4 possible 
configurations is made by the programmer. (Additional configurations are 
available as a special option.) 

By joining the east and west edges of the planar array, the network be- 
comes cylindrical, with open north and south ends. Using figure 6-1 as an 
example, a cylinder would be formed by connecting the following pairs of 
PE's: (0, 3) and (0, 0), (1, 3) and (1, 0), (2, 3) and (2, 0), (3, 3) and (3 , 0) . 

Similarly, by joining the north and south edges of the planar array, a 
cylinder with open east and west ends is formed. 

If the connections specified in the last two options are made at the same 
time, the figure becomes a torus. In this case, every PE in the network has 
four nearest neighbors. 

The possibility of selecting one of several configurations, depending on the 
problem, is attractive. 

For example, in a physical problem which is most conveniently described 
in cylindrical coordinates, the connection between the two opposite edges of 
the planar array is automatically made. Without this provision, program 
steps would have to be written to create this connection artificially. To join 
the most easterly and most westerly columns by programming would involve 
the following: every time all the PE's consult their western neighbors, the 
same instruction can be executed by the PE's in all columns except the most 
westerly one. This column's west neighbors are in the most easterly column. 
At least two more instructions (as seen later) would be required to get the 
required neighborly information from the east side of the square array to the 
west side. 
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11. COMMUNICATIONS BETWEEN UNITS (NCU, GPC, NETWORK) 



Communications between the Network, the Network Control Unit, and the 
General Purpose Unit are outlined by the schematic diagram shown in figure 
11-1. 



NCU 



GPC 



N NM 



40 B 




MI 



40 B 



M2 



SPNI5090-VA-36 



Figure 11-1. Routes of Access Between NCU, GPU, and Network 



In the diagram above: 



N, NM - PE Network and PE memory, respectively 
GPC - General Purpose Control 

Ml - Memory for NCU (40 bits) 

M2 - Memory for GPC (40 bits) 

A - Arithmetic Unit (associated with GPC) Both A and GPC 

together comprise the General Purpose Unit. 
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As seen from the above diagram, direct access is available for the NCU 
and GPC to each other's memory. This implies that the NCU is able to com- 
municate directly with the GPC memory and conversely, the GPU is able to 
communicate directly with the NCU memory. Communication with respect 
to the Network is supplied only through the Network Control Unit. The PE 
Network does not have access to the General Purpose Unit directly but must 
communicate through the Network Control unit, then to the GPU by the in- 
struction HELP. All data transfer from or to the PE Network must take place 
through the NCU (via either the L-Buffer or Broadcast Register), Also, all 
indexing instructions to the PE Network are executed and controlled by the 
NCU only. 
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12. NUMBER OF POINTS PER PROCESSING ELEMENT 



Each PE is capable of handling any desired number of grid points on a 
finite difference mesh (within the required storage limits for those points),, 
In general, this grid point - PE allocation is determined mainly by the re- 
quirements (size) of the finite difference network being solved and the number 
of Processing Elements available (for this report 32 x 32 or 1024 Processing 
Elements). The allocation can be as small as one grid point per PE or many 
grid points per PE subject to the storage limitations of those points. The 
only restriction imposed on this grid point allocation other than storage is 
that it be uniform throughout the entire Network, otherwise some special 
programming considerations must be applied. 

It might be mentioned here that as the number of grid points per PE in- 
creases, it becomes more desirable to use some form of a loop to avoid the 
tedious task of straight line coding for all points, provided the same instruc- 
tions apply at each point. This technique will result in a slight time loss 
due to the indexing instructions necessary to control the loop but could be 
very valuable in situations where the grid point allocation per PE is very 
large. In writing a loop for a set of points with a Processing Element, there 
are two factors which must be considered. 

a. The routing must be changed for various points (implies modification 
of the routing field in the instruction), 

b. The positions with the PE must be incremented (implies indexing 
of positions). 

Of the two factors given above, changing the routing appears to be the most 
formidable. This can be handled by either of the following two methods: 
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a. The routing can be indexed by indirect addressing (SOLOMON Proj- 
ect of Technical Memorandum No. 23 "SOLOMON II Programmers' Reference 
Manual"). This method consists of placing all of the required addresses to 
be used for each position in a sequential table in Central Control Memory 
and using indexing controls on the PE Network to provide the correct ad- 
dresses associated with each position in turn This corresponds to a "table 
look-up" and could result in a slight loss of time since the PE Network may 
be inactive while the fetching procedure of the correct addresses is being 
performed. 

b. Indexing the routing can be avoided by simply coding a separate 
subroutine for each position (or groups of positions), controllable by index- 
ing from Central Control. Each separate subroutine would contain the neces- 
sary operations and routings required for the calculations to be performed 

at that particular position. For the case where there are many points per 
Processing Element, a single subroutine can be written to take care of all 
interior points within the PE while the necessary subroutines can be written 
to account for the peripheral PE positions (one for each corner and one for 
each side containing other than corner points). In general, the corner points 
will require special treatment since it is at these positions where the greatest 
variation in routing will occur. An example of this type of program for the 
eight-point-per-Processing-Element case is illustrated in the coding for the 
initialization section of the sample forecast problem which is presented later 
in this memorandum. 
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13. ASSOCIATIVE MEMORY PROPERTIES 



The purpose of this section is threefold: to assign a definition unique to 
associative memory programs, to develop and illustrate an example of as- 
sociative memory programs, and to provide a flow chart (see figure 13-1) 
and program coding for a more comprehensive study. 
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Figure 13-1. Associative Memory Flow Chart 

To define associative memory programs, consider a normal program 
(with or without parallel processing). The program will call for specified 
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operations to be performed which involve specified values stored at specified 
memory locations. In contrast, associative memory programs may utilize 
a parallel process machine to perform unknown functions with unknown data 
stored in unknown memory locations. The term unknown here implies that 
the three involved factors are not rigidly designed into the program and are 
not determinable through visual examination of the program coding. During 
the development of the model program, the above mentioned three unknowns 
will be reiterated as they become a set of interrelated factors which establish 
the function and effect of the program. 

In examining the exact function or purpose of the model program, consider 
that it may represent one facet of a complex integrated system of data proc- 
essing rather than being complete and independent. That is to say, the pro- 
gram will represent to an experienced programmer or systems analyst a 
simple example of data maneuvers and comparisons which are now ordinarily 
handled in a very awkward and time consuming manner. Supporting this state- 
ment, the several major accomplishments of the program will be listed: 

a. Simultaneous magnitude or similarity comparison of a large set of 
data words to some controlling factor (herein called the master word). This 
corresponds to the first function block on the attached flow chart. 

b. Categorizing of the data words into one of the following four groups: 

(1) Data word = master word 

(2) Data word > master word 

(3) Data word < master word 

(4) Greatest similarity based on matching bits or absence of 
bits in corresponding bit positions. 

c. Controlling which classes of data, or combination of classes, will 
be assigned to further processing and which classes are not to be retained 
as unique categories for selection. 

Consider that the model program will first compare and categorize each 
data word (hereafter called DW and stored in processing element memory) 
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by a few simple steps relating the DW and the master word (MW). First 
simultaneous subtraction is performed and the difference MW-DW is stored 
back into each PE. The magnitude of the difference is examined to establish 
one of the following three relationships: 

a. MW-DW = 0; Equality 

b. MW-DW < 0; DW > MW 

c. MW-DW > 0; DW < MW 

Actually, the first two conditions cited are tested for and the third is assumed 
through this elimination process. This represents the operations from point 
A to point B on the flow chart; and as the chart points out, the data is being 
categorized during these tests by assigning one of four PE mode settings to 
each group of PE's. The operations begin with all PE's set to mode zero 
and the following modes are assigned to represent each group: 

a. Equality; Mode 3 

b. DW > MW; Mode 2 

c. DW < MW; Mode 1 

Note that at this point the SOLOMON II has used a half dozen instructions to 
compare and categorize 1024 data words in its PE's by relating them to the 
controlling MW. Mode selection would now be capable of assigning any or all 
of the data groups to one or more subroutines. The similarity test has not 
been made at this point since it would have no meaning when used with the 
above three tests. Equality would obviously mean complete similarity. How- 
ever, a lesser than word and a greater than word may be equally similar to 
the master word. 

The program now utilizes 3 of the 40 sense switches set from the 
SOLOMON II Console Panel to determine whether the data groups should re- 
tain their unique mode setting or be returned to Mode 0. A 4th switch is 
tested to see if the similarity test is desired. These processes begin at point 
B on the flow chart. 
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At point D, an interesting similarity test is achieved by simultaneously 
creating a word containing bits in all matching digits and then adding the bits 
to determine which word or words are most similar to the master word. 

First the matching bits are established by the Boolean AND function and 
the resulting word is stored. MW + DW is then performed as an OR function. 
This is complemented and another OR adds this and the result of the AND 
operations. The results contain bits in all matching digits. 

Having identified similarity by PE Mode State 3, the program would now 
make the final of several possible exits to the switch test rendezvous point. 
This corresponds to point RONDO on the flow chart. At this point consider 
the associative memory program in the light of the original definition. The 
unknown function is satisfied by the ambiguity of DW magnitude, sense switch 
settings, mode states, etc. The unknown data is the undefined contents of 
10Z4 PE data words. Finally, the unknown locations are illustrated in that, 
at point RONDO or other points, data catagories based on magnitude or simi- 
larity could be assigned an exit into subsequent processing unique to their 
classification without directly addressing a specific location or set of location; 
in the computer memory. 
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14. SPECIAL REGISTERS 



The SOLOMON II System has the following special registers available: 
M(j) 



CG 

RG 
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SL 

SS 

DL 

DS 



Machine Option Register (geometric control and edge 
connections) 

Column Geometric Register 
Row Geometric Register 

Broadcast Register (Option No. 1 or No. 2) 
Sense Lights Register 
Sense Switch Register 

Data Lights on Arithmetic Unit Register 
Data Switches on Arithmetic Unit Register 



The SL and SS Registers are located in the following positions: 





The DL and DS Registers are located in the following positions: 

« 40 
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15. SOLOMON II CODING OF BAROTROPIC MODEL WEATHER STUDY 



This section deals with a recent weather study program written for the 
SOLOMON II System. The new program is a follow-up to the SOLOMON 
program presented for the Barotropic Model; Technical Memorandum No. 14, 
6/12/63, (see flow chart figure 15-1). Unlike the SOLOMON System, 
SOLOMON II is a single address machine of fixed word length (20 bits). Add 
time has been decreased by a factor of 10 and multiply time by more than a 
factor of 42. New instructions have been added and the mnemonics of the old 
instructions have been changed. In short, the present program is consider- 
ably different from the old one. 

The data for the first data point of each PE is programmed and included 
in this section. Programming for the remaining points is similiar. In the 
previous program, a four-point-per-PE allocation was used. The same is 
true for this case. 

The execution time for the present program is obtained by multiplying the 
times for the appropriate sets of instructions by 4 and adding in the times 
for operations pertaining to all points. The computing time for the IBM 7090 
was cited in the previous report as approximately 1.61 x 10^ microseconds. 
The SOLOMON II computing time for this case is 804 microseconds. This 
gives an overall speed ratio of more than 2000/1 in favor of the SOLOMON II 
System. 

In this program, an attempt was made to keep the same variable names 

for quantities as were used in the previous report. On the first page, n is 

computed for point No. 1. The second page has the instructions for comput- 
2 k+1 k 

ing J (r7,$), v and $ for point No. 1. The subscripts are adjusted in 
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Figure 15-1. Flow Chart 



15-2 



15090A 



Computer and Data Systems 



the variables for later use. The relaxation process is programmed for 
the first point on page 3. The fourth page contains the test for convergence 
in all PE's while the last page lists the instructions for resetting the sub- 
scripts. The execution times for individual instructions and sets of instruc 
tions are listed in the program. 
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16. SOLOMON II AS APPLIED TO A PRIMITIVE EQUATION MODEL 



This section deals with the handling of a primitive equation model by the 
SOLOMON II System in connection with numerical weather forecasting. This 
primitive equation model, while being of a simple nature, was felt to ade- 
quately illustrate many of the various types of computation typically required 
by atmospheric circulation models in general. Included will be the following 
main topics: (a) presentation of the model and the method of solution; (b) 
detailed coding of this model on the SOLOMON II System with time estimate; 
and (c) a full discussion on the procedures used by the SOLOMON II Sys- 
tem. 

For purposes of this report, emphasis is being placed on the finite dif- 
ference method of integrating the hydrodynamic equations. However, some 
attention will be given to an alternate method of prediction which involves 
computations in wave number space. This method of prediction will, in 
general, require a much different scheme than that using the finite difference 
technique, since a given wave component is capable of interacting with many 
other wave components rather than merely the nearest four or eight neigh- 
bors. 

In the first SOLOMON technical report on Numerical Weather Prediction 
(SOLOMON Project Technical Memorandum No. 14) the Barotropic Model a 
two address machine was presented and its handling of the problem described. 
This report deals with the SOLOMON II System which is a single address 
machine but whose instruction execution times have been significantly reduced. 
16. 1 BACKGROUND INFORMATION 

Research in the field of numerical weather forecasting has provided 
methods of daily routine weather prediction which have given reasonably good 
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results in forecasting the upper and surface flow patterns, despite various 
simplifying assumptions in order to make the dynamic equations solvable. 
These methods have largely consisted of taking the basic equations of motion 
and continuity, utilizing assumptions (such as the hydrostatic and geostrophic 
equations) and finally filtering the equations to remove high speed and fre- 
quency gravity and sound waves from the solutions obtained. This last step 
was considered necessary to allow a longer time step in the forecast equa- 
tion or equations and still retain computational stability. This filtering can 
take on several forms depending on the type of filtering desired. In fact, 
two common filtering procedures are the two equations mentioned above, the 
hydrostatic and the geostrophic equations. Other filtering procedures are 
the utilization of various appropriate smoothing operators which are applied 
to the derivatives of the dependent variables and sometimes to the variables 
themselves, chosen in such a way as to eliminate the high frequency waves 
and still retain the solutions containing the longer and more meteorologically 
significant (Rossby) waves, unchanged. In general, these computations were 
carried out on a hemispherical rather than a global scale. 

However at present, more and more attention is being turned to the 
"Primitive Equations" (the basic hydrodynamic equations of the atmospheric 
circulation in their original form without filtering). These primitive equa- 
tions take on different forms depending upon the model being used but basically 
attempt to represent the following atmospheric effects: 

a. Friction effects on the fluid motion caused by ground and wind 
shear (i. e. , air and ground interface; air to air interface due to wind shear) 

b. Heating effects on the general circulation. These effects can be 
divided up into the following catagories: 

(1) Short wave radiation (radiation which enters the atmosphere 
and is absorbed by ground and air) 

(2) Long wave radiation (radiation which leaves earth's surface, 
part of which is absorbed by atmosphere) 
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(3) Boundary layer convection (small scale convection at bound- 
ary layers) 

(4) Tropospheric convection (larger scale convection throughout 
atmosphere) 

c. Atmospheric moisture content (Heat of evaporation and or conden- 



d. Terrain effects (topography) 

e. Vertical advection (motion or flux in the vertical) 

In a more complete general circulation model, the above mentioned effects 
are accounted for in part by the following descriptive equations: 

a. The equations of motion (frictional effects, terrain effects, vertical 
motion) 

b. Equation of continuity (vertical advection, surface pressure ten- 
dency) 

c. Thermodynamic equation (atmospheric heating by all processes 
except condensation/evaporation) 

d. Equation of moisture (heat gain or loss due to condensation/eva- 



The primitive equations, although in existence for many years, have never 
(until recently) been considered for daily numerical weather prediction for 
four important reasons. These reasons are: 

a. The primitive equations require a large data accumulation and in- 
put, for several levels. The present system of data collection is not nearly 
extensive enough to permit an accurate reconstruction of the initial state of 
the atmosphere, particularly for the large number of levels required for 
most models presently under consideration. However for research consi- 
derations, this problem is handled by simply setting the general circulation 
in a state of rest (i. e. , calm winds, constant temperature, constant pres- 
sure in the horizontal and constant variation with height, no vertical motion, 
no heating, throughout the atmosphere), applying the heating effects, and 
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finally observing the resulting circulation patterns caused by the heating 
induced. In theory, this process would be repeated for several variations of 
heating distributions. For routine forecasting, a minimum of levels must be 
used to utilize the limited data now available for analysis. For research 
work, this presents no problem, since most or all of the work is done on a 
theoretical basis. 

b. Since the primitive equations are essentially unfiltered ( short - 
high frequency waves retained), the time step to ensure computational 
stability is necessarily much smaller than in the filtered equations. In the 
filtered equations, it is possible to use approximately a l/2-hour time step 
and still obtain a nondiverging solution. In the primative equations, this 
time step must be cut down to 5 or at the most 10 minutes to ensure a stable 
computation. Obviously, the usage of so short a time step requires much 
more computation to obtain a forecast for the same future time period than 

in the filtered case. This problem can be solved by utilizing a faster parallel 
type computer capable of handling these computations in a minimum of time. 

c. A third problem is the absolute magnitude of the computations in- 
volved on a hemispherical or global grid. For example, a hemispherical 
grid network consists of 10, 000 grid points. Moreover, there are six levels 
to be considered in the model plus the surface. This gives a total of (10, 000) 
(7) or 70, 000 grid points to be considered. At each grid point, we are to 
carry 5 parameters plus storage for certain constants (for each grid point). 
These would include; coriolis parameter, map scale factor, various trigono- 
metric functions - if a change in the coordinate system is desired - and 
certain absolute constants. Also, there must be a working storage to pro- 
vide space for the logical and arithmetic operations to take place at each 
point. The computational problem becomes tremendous even for a fairly 
simple primitive equation model. It is clear that a computer of a parallel 
nature is extremely desirable particularly when a great many of the com- 
putations involve a large network where the same computation is performed 
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at each point, excluding the boundary points. These are treated separately 
as explained in SOLOMON Project Technical Memorandum No. 14. 

d. Direct measurement of the heating effects is extremely difficult 
if not impossible at present. Hence researchers are led to empirical for- 
mulae for the various types of heating effects or must leave them out 
entirely. Hopefully, a faster and more efficient computer will allow re- 
searchers to discover and formulate more realistic expressions for the rate 
of change of heating as it actually occurs in the real atmosphere. 
16. 2 THE PRIMITIVE EQUATION MODEL 

As a preliminary consideration on the primitive equations, it was felt 
that a three parameter model would suffice for the sake of simplicity. This 
model will illustrate many of the computations involved in a primitive equa- 
tion study without considering full scale atmospheric effects. As an added 
feature, a somewhat different finite differencing scheme will be considered 
to illustrated the flexibility of the SOLOMON II System. This finite differenc- 
ing scheme (Eliassen and Platzman) involves essentially a staggered grid 
system which employs not only the nearest four neighboring grid points but 
also the four corner points as well, for the finite difference formulation of 
the dependent variables and their derivatives. The usual centered time, 
center space finite differencing method will be used, conventional in many 
forecast problems of this nature. 

The primitive equation model chosen is one containing three dependent 
variables and having a free surface, otherwise known as the Hydrostatic 
System. This model has the following simplifying assumptions: 

a. Hydrostatic balance 

b. No frictional effects 

c. Homogeneous and incompressible atmosphere, with a free surface 

d. Heating effects have been disregarded 

From the basic vector equation of motion in the atmosphere: 



dV 
dt 



=-aVp-2ftxV+g 
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the following scalar equations of motion in rectangular coordinates can be 
derived 

at ax dY ax 

fL + M + v^ + fu + ^ = o 
at ax dY dY 

The third equation necessary to solve this system is the continuity equation: 
1 dp 

- = - V • V 

p dt 3 

But for our system using the assumptions of incompres sibility and free 
surface, the continuity equation can be put into the form: 

d<t> 4. it ^ , v d4> . , (dV dV\ 

where 

U = horizontal component of velocity (west to east) 

V = horizontal component of velocity (south to north) 
X = coordinate distance (west to east) 

V = coordinate distance (south to north) 
= geopotential height, given by g Z 

Z = height of an isobaric surface 
f = coriolis parameter given by 2 £l sin 
£l = angular velocity vector of earth's rotation 
6 = latitude 
a = specific volume 
p = dry air density 
g = gravitational acceleration 

V = dell operator 

p = atmospheric pressure 

V = horizontal velocity vector 

= three dimensional velocity vector 
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For the SOLOMON II System, it is extremely convenient to use the 
spherical coordinate system to ensure proper continuity at the equator re- 
quired by a global model. This has the added feature of allowing mass trans 
port across the equator without the fitting and interpolation procedure of 
rectangular grid systems at the equator. The two polar points where singu- 
larities occur are to be handled separately and in a manner to be outlined 
later. With this adoption, the equations of motion can be expressed in 
spherical coordinates as: 

a X cos 9 = 2 a 9 {X + &) sin 9 - — t 

' a cos 9 oX 

a = - a X cos 9 {X + 2 ft) sin 9 - - |^ 

a oO 

where 

X = longitude 
9 = latitude 

a = radius of earth (taken as a constant) 
ft = angular velocity of earth's rotation 

The total derivative operator is now given by: 

K } dt at + dt + y d9 p ap 

d 

However in this coordinate system the operator vanishes since X and 6 

dp V 

can be taken to be independent of the vertical coordinate p. To provide con- 
sistency throughout the system the following terms have been omitted. 

2 XI a cos 9 - interpreted physically as the vertical component of coriolis 
term 

Zk9 

interpreted physically as inertial terms 

2 X a cos 9 

The equation of continuity in spherical coordinates becomes: 
jf = " 7 ( 4 V) = - ^ (* X) - sin 9 (<f> 9 cos 9) 
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Here and in the equations of motion, differences of gravity with height 
have been neglected. Also for purposes of expanding the above equations, 
the following spherical relationships are available: 

U = a X cos 9 dY = a d 

V = a 9 ax = a cos 9 d^X 

Hence the equations for the model can be expanded into the following form: 

dU | U dU | V au U V tan 9 f 1 dcj) _ 

at a cos 9 dX a 39 a a cos 9 dX 

ay ___u_ ay + v ay u 2 tan | f | 1 a<£ _ 

at a cos dX a. d 9 a a d(9 

dcf> u a</> v a<£ , 1 ay av „ nN „ 

TT + Z rf + — I7T + TT + J-5- - V tan Q) = 

at a cos 9 dX a 00 a cos 9 o X o 9 

In meteorological problems of this nature, it is usually desirable to use 
a conformal mapping transformation in order to map the solution from a 
curved to a flat surface. For this particular case, a Mercator Projection 
was selected for convenience at the equator. This transformation can be 
made by the following relationships: 

X = a X Y=-a^i 

v < a ■» j- ~ 1 + V = a = M _1 Y 

V = a X cos 9 = M X 

h = cos 9 (1 + sin 0)" 1 
M = sec 

Putting these relationships into the spherical equations yields the following 
system expressed in map coordinates: 
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16. 3 FINITE DIFFERENCES 

As was mentioned previously, the finite differencing scheme to be used 
in this illustration is the one proposed by Eliassen and Platzman, the 
"staggered grid" system. Basically, this system consists of "staggering" 
the dependent variables in a systematic manner over the entire network. This 
is illustrated by the following diagram. Note that not all parameters are 
stored at all points in the network but are staggered in a definite pattern 
throughout the entire grid system (see figure 16-1). 







•* 


• v 




• 


o u 


• 


•+ 


• v 


•* 






• 


• u 

SPN 13090 


• 

-VA-39 



Figure 16-1. Staggered Grid System 



Another feature added to this scheme was to systematically stagger the 
dependent variables in time as well as in space. This was done by setting 

for each integral time step (i. e. , 0, At, 2 At, 3 At, n At) the grid 

distribution shown above for unprimed variables U, V, and (f). For the l/Z 
integral time steps (i.e., l/2 At, 3/2 At, 5/2 At, (1 + l/2 At) the grid 

! f I 

distribution shown below for the primed variables, U , V , and was used. 
Hence, there are two variable distributions depending upon time step being 
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considered. The dependent variables alternate positions on the grid with 
each l/2 integral time step. This system has been shown to be more ef- 
ficient than a grid system having all dependent variables stored at all grid 
points for all times. The variation of this grid system with space and time 
is shown in figure 16-2 below. 

For the purpose of this problem, the following distributions were used 
for each variable being computed. From these, the equivalent staggered 
grid finite difference equations for the hydrostatic system were derived. 
There are two sets of equations with three per set; one set for the unprimed 
variables and the other set for the primed variables. These are as follows: 

(A = grid size increment) 



uv' 0* uv' -11 0°1 11 

u'v -10 00 10 

uv' 0' uv' -1-1 0-1 1-1 



' ' At r 1 r 

U 00n = U 0On-l - M 00 ( X) {*10^-10 + 4 [ (U 1 1 + U - 1 1> (U 1 1 " U - 1 1> 

+ (u i-i +u _i-i>< u i-i - u -i-i)] 4 v oo (u n - u i-i + u -n 

M 

- u -i-i>} + v oo At f oo [ 1 + aol (u -n + u i-i + u n + u -i-i>] 



u'v u'v -11 01 1*1 

4> uv' 0* -lb 6o lb 

u'v u'v -1-1 0-1 1-1 



11 At r 1 

V 00n = V 00n-l - M 00 ( X> { *0 1 " V 1 + 2 U Q0 ( V l 1 " V - 1 1 + V l - 1 

- v. ul ) + \ [tv n + v^) (v u - Vj.,) + (V_ u * V^.j) (v_ u 

M 

- V _ _)1 \ - U At f (1 + -t^- V n J 

-1-1' J J 00 00 2Aia 00' 

u'v -11 01 11 

uv' 0* uv' -10 0*0 10 

u'v -1-1 0-1 1-1 
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Figure 16-2. Staggered Grid System 
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It is worth mentioning that the finite differencing scheme used in this re- 
port (and subsequent coding) is one typically used in general circulation com- 
putations and experimentation. This is particularly true when the spherical- 
polar coordinate system is used, due to the convergence of the longitude lines 
near or at the polar regions. When making use of such a coordinate system, 
it is particularly desirable to use a finite differencing scheme which is not 
affected appreciably by grid size and still retain computational stability. 
Since, generally, the governing equations of a general circulation model are 
nonlinear in nature, the criterion for computational stability becomes even 
more critical than for the linear case. 

As it turns out, the type of instability which has hampered general cir- 
culation computation the most is the so-called nonlinear computational in- 
stability of the finite difference analogue representing the physical system 
under consideration. This type of instability, upon investigation, was found 
to have the following characteristics: 

a. This instability appears only in nonlinear finite difference equations 
(finite difference analogue of the nonlinear partial differential equations) 

b. This type of instability cannot be avoided or postponed by simply 
reducing the time interval used in the forecast equations. 

c. The occurrence of this type of instability is, for all practical pur- 
poses, independent of the grid size being used. 

Extensive research has been conducted in this area and schemes for 
eliminating this type of instability have been proposed for various situations. 
Fundamentally, these finite differencing methods are based on energy con- 
straints imposed on the entire system (i. e. , conservation of energy, kinetic 
and potential, or conservation of mean square vorticity). The staggered grid 
system plays an important part in these schemes when applied to the primi- 
tive equation models. 

For the hydrostatic system, the sum of both potential and kinetic energy 
must be approximately conserved. From the above consideration, the 
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staggered grid technique used can be shown to be sufficient along with the 
usual centered time and spatial finite differencing forms to retain long- 
range nonlinear computational stability and continuity. The staggered grid 
system has other distinct advantages since it requires less storage than the 
usual types of grid system allocations and finally allows twice as many grid 
points for roughly the same amount of computation. 
16. 4 INITIAL DATA 

To initiate the computational procedure, it is necessary to obtain an 
initial wind field (u, v) along with a geopotential height field for the given 
synoptic situation. Since, in general, upper wind data are either not avail- 
able or are not sufficiently accurate for numerical forecasting purposes, a 
theoretical method must be used to obtain these data initially. It is possible 
using geostrophic and hydrostatic considerations to express a theorectical 
wind field that is in a state of balance with the existing geopotential height 
field. To do this, it is necessary to introduce the concept of a stream func- 
tion or in simple terms, an air particle (wind) trajectory field. The balance 
equation relating the stream function and its corresponding geopotential 
height using the above assumptions, is: 



f = Coriolis parameter 

\\) = stream function 

V= horizontal dell operator 

= geopotential height (obtainable from a weather map or upper atmos- 
pheric data) 

k = unit vector in the vertical 

The wind field can then be expressed in terms of this stream function by the 
formulae: 

V = k x V ^ 




where 
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or in component form: 
U =-dj V = 4j 

y x 

For many situations in numerical weather forecasting, a good degree of 
accuracy and realism is retained by omitting the nonlinear terms in the 
balance equation. Hence to a good degree of approximation, the initial stream 
function can be determined from the linear balance equation: 

V- (f V ijj) = v 2 

and the wind field as before by 

U =-dj V = Oj 

y x 

For consistency in this model, these relations for initialization must be 
transformed into spherical and Mercator Coordinates respectively. For 
the spherical transformation, the following identities can be employed: 

a cos a cos ft d A. 

2 , 

2 , 1 d / n d4 \ , 1 d <ft 

v *=-z 7 (cos0 aT )+ ^ F~" a~~T 

a cos 6/ a cos ft A. 

Vf • V + = ^ [sec _ _ + _ _ j 
The linear balance equation in spherical coordinates is then: 



La cc 



d_ {cose l± )+ L a ^ 



2 " + I <Li d± 

a. dd d6 



a d0 39 2 2 a v^ 2 

cos y a cos V a A. 



a / i a 2 4> a f 

TE < cos 61 JT) + ~, o T smce = 



2 . de v ao 7 2 2 n 2 A a\ 

a cos y a cos 9 o\ 



The transformed wind relations are: 



u = . I a±. v = 1 «i 

a a ^ a cos a ^ 
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Finally, with the aid of the Mercator Transformations, the system can be 
cast into the form: 

2 2 

il ii + f + <L_i\ - d * + _^ £ 

fly dy M 2 2 ; 2 2 

ay a x dy 

cos 9 dY cos dx 

The linear balance equation will be solved by a method equivalent to the 
Liebmann relaxation technique which consists of computing a residual from 
an initial guess, then applying a corrector formula (over-or-under-relaxation) 
to reduce the residual at each point as much as possible. This procedure is 
repeated until the residual is reduced to less than or equal to specified limit 
(measure of error between computed and true solution). For this situation, 
the residual is computed by the formula: 

_m / , m ,m ,m+l ,m+l „ ,m 

r. . =1.(4'. + ^.^i • + ^- i . + 4»- - i - 4 4». . 
ij ij ^ i, j+i i+i, j i-i, j ij-i ij 

+ (f.. - f.. J (4j. . - 4j.. J - F.. 

ij+i ij-r VY i, j+i ij 

where 

m = order of approximation 

F. . = 6. . . + 0. . . + 0. . . + 0. .,-40.. 
ij r i+l, j y i-l, j r ij+l i, j-1 ij 

The predictor - corrector formula is given by: 

, m+1 , m a m 
ip. . = \\>.. + -r R.. 
ij iJ 4 ij 

The test for convergence to the true solution can be written in the form 



e - 



where 



R > 
ij I 



a is a relaxation coefficient generally in the range: 0. 2 < a < 2 
e is a predetermined convergence criterion 
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The procedure given above is repeated until the convergence criterion 
given above is met at all points or until it has cycled a stipulated number of 
times (implies nonconvergence to solution). To initiate the relaxation pro- 
cedure, a guess function is used. This consists, for this case, of the 
geostrophic relation: 

Once the stream function has been found, the wind field is then obtained by 
the finite difference formulas: 



a XY i+i, j T i-i, y 

where for this example, both the stream function and the wind components 
are specified at the poles. 

From the initialization procedure, the initial fields of U, V, and </> are 
obtained. To start the forecast, the following conditions are used: at time 
t = 

U = u' 
v = v' 

Now use the finite difference equations to solve for the primed variables at 
time t = l/2 At. This is done by making the following substitution: 



1 

U 



00n-l = U(t = °) 



I 

V 



00n-l= V(t = > 
Wl = *' (t = > 

and replacing At by At/2 for the first time step. All time steps thereafter, 
the normal finite difference equations are used, solving for the unprimed 
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and primed variables in succession. This is done for all parameters U, V, 

it i 
and </>; U , V , and . 

16. 5 AVERAGING PROCEDURE AT POLES 

As was mentioned previously, a special treatment is required at the polar 
regions where singularities occur in this system. In this problem, new 
polar values for each variable are generated by averaging the grid point 
values closest to the pole (i. e. , nearest two sets of latitude points) contain- 
ing that particular variable. This is done after new values for that particu- 
lar variable have been computed at all interior points. The new averaged 
value is then substituted in place of the old one. Expressed mathematically, 
this is: 

N N 

VI V N VI 

i = 1 i = 1 



U 



N N 

I V N Vl V N 

i = 1 i = 1 



N N 

Vl V N Vl V N 

i = 1 i = 1 

Where N = number of grid points containing a particular variable. Note, 
due to the staggered grid configuration, N will, in all cases, be equal to l/2 
the total number of grid points contained along a given latitude line (for this 
example N = 64). Also, due to the variable allocation on this grid system, 
this averaging procedure will take place on either the 1st or 2nd latitude 
circle nearest the pole but not on both simultaneously for a given variable. 
16. 6 SOLOMON II SYSTEM 

This section will deal with the actual implementation of the SOLOMON II 
System to this particular problem. The concept of the Parallel Processing 
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Element array as applied to a simple forecasting model has already been 
presented in SOLOMON Project Technical Memorandum No. 14. For this 
case, the grid point allocation was set to four points per Processing Ele- 
ment. Since, in general, the current Processing Element array being con- 
sidered is a 32 x 32 array, it was decided to set 8 staggered grid points per 
Processing Element. It was found that this allocation (4 latitude points across 
and 2 longitude points down per PE) would give a resolution of 2. 8125 degrees. 
This would allow a grid network of 128 points around a latitude circle and 64 
grid points along a longitude line from pole to pole. It was felt that this 
resolution was adequate for the purposes of this numerical forecasting prob- 
lem. According to the staggered grid system, the grid point distribution per 
PE was used as shown in figure 16-3. 
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Figure 16-3. Grid Point - PE Allocation 
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With this allocation, the computations are programmed for each point within 
the PE. Due to the dependent variable distribution, the computations will 
not be the same for each point within the PE but will be the same for cor- 
responding points in all other PE's in the network. Hence, the concept of 
parallel computation applies equally well to a staggered grid system as well 
as a conventional one. Therefore, the finite difference equations for this 
physical system are applied to their respective points in each PE until the 
computations are performed at every point within the PE (and hence all PE's). 
16. 6. 1 Computing North and South Pole Average Values 

Once each half time step, new average values of the U, V, and (j) values 
in both the 0^ and 31st rows must be computed for use as north and south 
pole values. What must be formed are 6 sums each with 64 summands. 
Several methods of achieving this result have been examined; the fastest one 
found is shown in the following program. Rows 0, 1, and 2 are used to com- 
pute U, V, and PHI , respectively, for use as south pole values; rows 31, 30, 
and 29 are used to compute U, V, and PHI for use as north pole values, 

The program has two parts. The first, given on page 1, computes the 
average of the pair of values stored within each PE and distributes these 
initial averages across rows 0, 1, and 2 and 31, 30, and 29. The second 
part, given on pages 2 and 3, computes the average of the 32 values stored a 
across each of these rows and places the 6 average values in the L-Buffer„ 
The program is written to minimize loss of significance; shifts are performed 
only after additions, and overflow is compensated for. Note here that in 
those operations that indicate an interest in overflow detection, AP0 for ex- 
ample, the last mode given is that to which PE's are to be set if they produce 
a result which overflows. Thus, the instruction 

AP0 20 TO 

says, all PE's in mode 2, add the contents of TO to the contents of P; PE's 
in which overflow occurs, go to mode 0. 
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16. 6. 2 Method of Polar Data Transfer 

The method providing access to polar data becomes an important consi- 
deration when actual programming is attempted. Due to the nature of the 
SOLOMON II System, there are two options which can be made available 
using the Broadcast Register. These are: to provide access to all PE's 
when a constant is required by the Network, and to provide a variable to the 
north row of PE's (row 0) when a north routing is required while all other 
rows obtain their variable from the row of PE's above; e.g., row would 
receive its value from the B Register but rows 1, 2 . . . 31 would receive 
their data from rows 0, 1, ... 30. A similar situation exists for a south 
routing. To obtain this special feature, the programmer must give up the 
horizontal cylindrical connection. This last option will allow instructions 
requiring north or south routings to be executed for all PE's including those 
nearest the poles without a transfer of polar data from the Broadcast Reg- 
ister to a temporary location in those PE's concerned. This last feature 
will simplify the programming considerably for Network problems in gen- 
eral. 

In regard to the transfer of PE data to the Network Control Unit, the 
L-Buffer will be employed. The required averaged data from the PE's will, 
upon instruction, transfer to an address in the L-Buffer Memory. The Net- 
work Control Unit will then, upon instruction, retrieve these data from the 
L-Buffer Memory and store the averaged values in the NCU Memory so they 
can be loaded into the B register when required. The hardware used in 
transferring plus the PE Network itself is shown in figure 16-4. 

In general, global circulation models require cyclic continuity in the con- 
stant latitude direction. This implies that the values at the right and left 
hand (east and west) boundaries must be equivalent at all points on the 
boundary for all times. This condition is well simulated on the SOLOMON 
II System by simply connecting the PE Network in the form of a cylinder, 
with the top and bottom set as the north and south poles, respectively. This 
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Figure 16-4. Configuration of Network 
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configuration ensures the cyclic continuity required by global or hemispheri- 
cal forecasts on the western and eastern boundaries. As has been stated 
previously, the Network Control Unit (and the Broadcast Register) will serve 
as the top and bottom of the cylinder. 

16. 6. 3 Usage of the L-Buffer in Both Polar and Latitude Data Transfer 
In certain portions of the following SOLOMON II Program, advantage 
was taken of certain parameters which were a function of latitude only (i. e, , 
constant for a given latitude line). In these instances, the L-Buffer can be 
employed to provide each latitude line (row of PE's)with such factors as the 
coriolis parameter and map scale factor when they are required by a given 
PE instruction. Upon instruction, these parameters can be transferred via 
the L-Buffer into the PE Network, thereby saving storage within the PE Net- 
work itself since for a given row of PE's (at each position taken one at a time) 
these quantities are constant. 

It has already been demonstrated how the L-Buffer can be used to transfer 
polar data from the Network into the NCU Memory for the averaging process. 
Since this is along columns of PE's rather than rows, this implies that this 
facility can be used in either direction simply by supplying the necessary 
connections from the L-Buffer to both rows and columns of Processing Ele- 
ments. This circuitry is shown in figure 16-5. It can be seen that this 
facility can be an extremely useful tool in models of this sort where the grid 
coordinates lie along lines of constant latitude and longitude. 

When one set of connections (i.e., to PE columns) is being utilized, the 
other set (i.e. , to PE rows) is disconnected (or conversely) by control. 
16.6.4 Computational Procedure 

a. Read in initial geopotential height field, constants, and latitude data 

b. Solve by relaxation the equation 



dY dY •> 2 . z * z . 2 
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Figure 16-5. Diagram of L-Buffer Connections 



for the initial stream function using as an initial guess 
45 



As boundary conditions set 
1 



Np 2 ft ^Np 

1 f 



Np = north pole 
sp = south pole 



Sp 2ft Sp 

c. Solve for the wind field by 



1 d± = _J cKjj. 

cos 9 dy cos 9 
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Take v = u = at poles 

d. Solve the forecast equations 



— = - M (u — + v + -~) + f v (1 + — — — u) 

at ax ay $\ J z& a ' 

av . , , av , a v , a^ v £ , , , m x 

^— = - M (u 7— + V r — + - — ) - f u (1 + — 7r — u) 

at ax dy dy' I & a 7 



a^ 
at 



= - M 



a a f 

— (U 0) + — (V 0) " — FT V 

a x ay a& 



for the desired number of time steps. See figure 16-6. 
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Figure 16-6. Flow Chart for Primitive Equation Model 
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16.7 ASSUMPTIONS IN CODING 

The computational procedure for this forecasting model is given in the 
preceding pages and shown in figure 16-7. It should be stated that this 
sample model does not represent an operational forecasting problem in that 
it has been simplified in the following manner: 

a. It is assumed that the initial geopotential height field has been read 
in and stored in the PE Network. Also constants and parameters, depending 
upon latitude, are stored in Central Control Memory. 

b. This problem has been written in a fixed point mode, but some of 
the detailed scaling operations have been omitted (i. e. , the shift instructions 
necessary to line up the binary points of the two operands, prior to addition 
or subtraction have been left out; also, the necessary shifts to perform the 
required multiplications have been included but the magnitude of the shifts 
have been omitted). 

c. It is assumed that the number of iterations needed for convergence 
of the relaxation procedure do not differ significiently for the SOLOMON II 
System and a sequential type of computer.* 

d. No checking or correcting of overflow conditions has been pro- 
grammed. 

e. The given program represents the main computational parts, 
namely: 

(1) Initialization for stream function (one relaxation) 

(2) (Computation of initial wind field from stream function 

(3) A typical time integration of the six finite difference forecast 

equations. 

And does not include input/ output of data, initial, final, or intermediate. 

f. The running time estimates are based on the following three sections 
of the program. These are: 

* A numerical investigation was conducted on this assumption, and tests 
were made on varying grid sizes for a particular equation. For the re- 
sults and discussion of this experiment see paragraph 17. 2 of this report. 
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(1) Running time for one iteration of the relaxation procedure 

(2) Time for computing initial guess for relaxation procedure 
plus computing wind field from stream function 

(3) Computation time for a typical time integration of the six 
forecast equations (not the first time step). 

16.8 SOLOMON II CODING OF PRIMITIVE EQUATION MODEL 

The coding for this model is divided up into the following sections: 

a. Computation of initial guess function from given geopotential height 

field 

b. Coding of the relaxation procedure required to find the initial 
stream function 

c. Computation of initial wind field from stream function 

d. Computation of forecast variables and averaging procedure at 

poles . 
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To obtain the initial stream function, computations were performed at all 
eight points in the Processing Element (all points in the Network). From this 
stream function, the wind components U and V were then computed for those 
points required by the staggered grid system (in coding, positions No. 2, 
4, 5, 7 in PE's). Also, as a matter of convenience in indexing, the eight 
points per Processing Element were numbered through 7 for the initiali- 
zation section, then renumbered 1 through 8 for the forecasting portion. 

Finally, the quantities fAt, - l/2 N , - M , - fAt , - f n , 

A A oA/.aioA£ 

M 

- fAt -r—n — are assumed to be stored in Central Control Memory. Con- 
2 ii a 

stants for the system are a and € which are broadcast to the PE Network via 
the Broadcast Register. Quantities such as 1/2, 1/4, etc are handled 
simply by shifting the operand the required number of places (left shift-binary 
system). The same applies to factors like 2, 4, etc (right shift-binary 
system) . 

16.8.1 Variable Storage 

a. Initialization 

PHI, 1 (j) , modified by Index Register No. 1 (positions through 7). 

PSI, 1 modified by Index Register No. 1 (position through 7). 

FIJ, 1 Storage for F.. in finite difference balance equation, modified 

by Index Register No. 1 (positions through 7). 

RIJ, 1 Storage for residual R^j 1 computed during relaxation procedure 

modified by Index Register No. 1 (positions through 7). 

f, 1 Coriolis parameter f j4 , modified by Index Register No. 1 

(positions through 7). 

FOUR(/>, OP1, t 
n _ Temporary storage locations 

4ALPHA Relaxation coefficient (constant for all positions) 

EPS Convergence criterion (constant for all positions) 

CHECK Storage location reserved for checking of convergence criterion, 
in PE's 

b. Computation of Wind Field. 

PSI 0-7 ijj at positions through 7 
i 

UPI U at position No. 1 
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VI V at position No. 1 

i 

UP3 U at position No. 3 

V3 V at position No. 3 

U4 U at position No. 4 

i 

VP4 V at position No. 4 

U6 U at position No. 6 

VP6 V at position No. 6 

c. Forecast Section (positions above are now renumbered 1 through 8 
for each Processing Element). 



UP2 


TT at - nnQi'H nn T\To ? 






TTP4 


1 

T T O "f" T"VO G 1 "f" T f~lT"» 1\T f\ A 

KJ cut JJUoltXUIl INUi t: 






VP5 

V -I- —/ 


1 

V at nHQifinn TNJ<~» ^ 

V CtL UU o XLXUXl 1NVJ» ^/ 






VP7 

V x I 


t 

V CLL UUolLlUii INC. 1 






PHTP6 

X X XX X/ \J 


f/» ' pt +■ nfmitinn T\To A 

y CtL L/v O A L X CLJL KJ 






PHIP8 

X X 1X1 KJ 


(ti ' 3t" nn<?iti nn T\To 8 






U5 


U at position No. 5 






U7 


U at nosition No 7 






V2 


V at position No. 2 






V4 


V at position No. 4 






PHI1 


at position No. 1 






PHI3 


(f) at position No. 3 






N 


North routing 






S 


South routing 






E 


East routing 






W 


West routing 






B 


Broadcast Register 






TEMP XY 


Temporary storage , 


at position X, 


number or letter Y 


A, AB 


Temporary storage, 


position No. 


5 


G,GH 


Temporary storage, 


position No. 


2 


K, D 


Temporary storage, 


position No. 


1 


C, CD 


Temporary storage, 


position No. 


2 
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E, EF 


Temporary storage, 


position No. 


5 


L,F 


Temporary storage, 


position No. 


6 


M, MN 


Temporary storage, 


position No. 


4 


O, OP 


Temporary storage, 


position No. 


7 


Cl P 


x c iiiu w j. a. j. y o x cl ti ^ 


nfmition No 


8 


R, RS 


Temporary storage, 


position No. 


7 


W, Y 


Temporary storage, 


position No. 


4 


Z,F 


Temporary storage, 


position No. 


3 



16.8.2 Loop for Computing and ijj.. 

Each PE contains 8 data points, i. e. , 0, 1, . . . , 7. It can be seen from 
figure 16-8 that only the corner points need special handling (i.e., only points 
0,3,4, and 7 require special routing). The expression for F_ is: 

ij y i+l,j l-l, J ^llj + l ij-l ij 

Again it can be seen from figure 16-7 that if i > 4, a south routing is re- 
quired for the evaluation and if i < 3 a north routing is required, (i = index 
for points through 7. ) 

Essentially, the loop is performed under the control of one Index Register. 

The first important test is to determine whether i > 3, (i. e. , determine 

whether a north or south routing is required). Suppose it is found that a 

north routing is required; immediately, it is known that two of the four 

neighbors can be summed, (i.e. , that <f> . . ^ T ) and (j). . can be added. Next, 
& i+4 N i+4 

it must be determined whether i is either or 3 since these 2 points require 
special routing. Three possibilities are present for this case: i=0, < i < 3, 
and i=3. The appropriate case can be determined by the use of index test 
instructions and index increments (see instruction sheets). Suppose it is 
determined that i=0. Then a west routing is required. If i=3 , an east routing 



NOTE: In actual practice, many of these storage locations would be con- 
solidated, thereby reducing considerably the required amount of 
working storage. For purposes of illustration, this was not done. 
Also the absence of a prefix implies internal routing. Finally, all 
above parameters are stored in the PE Network with the exception 
of 4ALPHA and EPS. 
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Figure 16-8. Grid Point - PE Allocation for Initialization 



is required and if i/0, 3 no special routing is necessary (in addition to the 
north routing already determined for the first test). 

At this point it is only necessary that the remaining internal neighboring 
points be added into the previous , sum and for the subtraction of 4</k to be 
performed. 

Suppose that i > 4. It must then be determined whether i=4, 4 < i < 7, or 
i=7. From this point, the procedure is exactly the same as above. 

After each F_ is computed, the initial guess for ^ is computed. This 
consists only of a multiplication since the guess function for is = x 

Any questions that remain on this routine may be answered by examining 
the coding sheets provided. 
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16.8.3 Relaxation Routine 

The relaxation routine is very similiar to the routine for calculating F... 
However the relaxation routine has a few features which the F.. does not have 
and does not need. 

First, Index Register No. 3 is used to count the number of iterations and 
is checked against a previously determined upper limit. If the number of 
iterations reaches this value before convergence is achieved, the program 
will be stopped. A stop such as this will indicate either faulty data or a poor 
choice of upper limit. 

Also, after the residual is calculated for each point, the degree of con- 
vergence is checked. If convergence has not occurred, a mode control indica 
tor is set. After the eighth residual has been computed and checked for con- 
vergence, a check is made to see if this indicator was set during any of the 
convergence comparisons. If the indicator was set, the program again picks 
up the first point and continues through the remaining seven points. This 
process is repeated until convergence has been reached for each of the eight 
points. 
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16.8.4 Forecast Section 

The forecast coding is divided into 12 parts. These are separate codings 
to compute: 

U at position No. 5 

V at position No. 2 

(f) at position No. 1 

U* at position No. 2 

V* at position No. 5 
i 

9 at position No. 6 
U at position No. 7 

V at position No. 4 
at position No. 3 
U* at position No. 4 
V* at position No. 7 
0* at position No. 8 
Symbolic explanations for the coding instructions are given in the right 

hand columns and are mostly straightforward in nature. One situation 
should be described in detail however - the question of routings other than 
the nearest four neighbors. As has already been discussed, a given base 
PE is capable of communicating directly only with its nearest four neighbors 
itself, or Central Control. Hence, when data is required from locations 
other than these regions of access, special provision must be made to trans- 
fer the required piece of data to an address where it can be fetched readily 
by the base PE. In this sample problem, situations arise where data is re- 
quired from corner neighbors (i.e., nw, sw, se, or ne neighboring PE's) by 
the given finite difference scheme. Since this corner routing is not available 
it was necessary to transfer the required data into an appropriate neighborin 
PE where it is directly accessible by the base PE. In general, this access 
to corner PE's merely requires two additional instructions. This type of 
technique was applied to various portions of this sample coding where the 



(Refer to paragraph 16. 6. 1 for the 
detailed code used to obtain the 
averaged values for all variables) 
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forecast variables were being computed at corner positions of the PE's. 
Along this line, this same technique can be applied to any order finite differ- 
encing scheme if required values are stored in Processing Elements outside 
the nearest eight neighbors. 
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17. OTHER CONSIDERATIONS 



17. 1 HIGHER ORDER DIFFERENCING SCHEMES 

Due to the design of the SOLOMON System, there are several modifications 
that can be made in numerical forecasting problems of this nature. It has been 
found that use of a spherical grid system mapped onto some type of conformal 
map projection can create computational difficulties due to either decreasing 
grid size as the polar regions are approached or lack of resolution in certain 
critical regions due to the mapping technique. To compensate for the latter 
difficulty, it is very feasible to introduce varying mesh lengths at certain 
latitudes (or at all latitudes) to give the desired resolution (care must be taken 
however not to create regions of discontinuities where the grid mesh is 
changed significantly). Using the PE Network, one can store the various 
values of the grid mesh size without additional effort, rather than taking the 
grid size as being constant. One can therefore compensate for any distortion 
produced by the mapping technique and obtain the desired resolution in both 1 
latitude and longitude units. 

More sophisticated finite differencing schemes have either largely or 
totally eliminated the former problem. As was mentioned earlier, these 
schemes are based on energy conservation and involve grid points other than 
just the nearest four neighbors. If the four-grid-point-per-PE allocation is 
chosen, it is possible to utilize the parallel computation technique to the full- 
est extent. Suppose, for example, the following expression is to be evaluated. 

Q = C x [ (A u + A ul ) (a 2Q + a 0() ) - (A_ u + A^^) (a^ + a -2Q )] 

+C 2 Kl + (P 21 " P 1 } " (a Z-l + a -01> ^2-1 " P-01 } ] 



w) Computer and Data Systems 



17-1 



15090A 



+ C 3 [ A 10 (a 21 + Vl> * A -10 (a 01 + a -2-l> + Pio ( P 1 + P 2 -i> 

-P'-10 ( P.21 +P 0-1>] 

using the following variable distribution and PE allocation as shown in figure 
7-1. 

The code for the SOLOMON System is On the following coding forms. 

The first set of instructions 1 through 13 computes the value of the quanti- 
ties, (A 11 + A ) (a 2Q + a ), (A + A _ 1 _ 1 ) ( a 00 + a _20*' T ° d ° this ^ is 
necessary only to compute the value of (A + A ) (a + a ) at the base 

JLJ. J. ™ i. l*\J U U 

PE since the value of (A + A ) (a + a „ n ) will be computed automat- 

■11 "" 1 ™ JL \J\J *~ CdKJ 

ically by the west PE for its base No. 3 point (i. e. , (A^ + A^ ^) (a^Q + a Q0^ 
for west neighboring PE). Once (A + A ) (a_ + a ) has been computed 
at all PE's for point No. 3 and stored in some temporary location, all that is 
necessary to form the required difference is to specify a west routing for 
that temporary location and subtract this from the quantity stored in the same 
temporary location (same address) of the base PE. An averaged A must be 
routed to the southern boundaries of PE's via the Broadcast Register prior 
to the operations' similar to various situations in the forecasting problem. 
This same general procedure can be applied to each of the terms in the given 
expression. For example, the next set of instructions 18 through 29 compute 
the quantities (a^ + a^) (p^ - p^) and (a^ + a^) (p^ - p^); (a^ + 

cl q1 ) (P 21 - P Q1 ) for the base PE; and (a- 2 + a Q _ 1 ) (P 2 -l " ^0-1^ f ° r the south 
neighbor. Again care must be taken to supply this quantity for the southern 
boundary rows of PE's (in this case zero since the difference of the same 
averaged value is zero) via a south routing from the Broadcast Register. The 
instruction sets 14 through 17 and 30 through 34 are used to multiply the given 
quantities by their respective constants and to add the resulting quantities 
together. The instruction set 35 through 54 computes, in the same manner, 
the quantities a\ q (a^ + a^) , (a Q1 + a^) , p\ Q (P Q1 + P^), and 
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Figure 17-1. Grid Point - PE Allocation 
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^ 10 ^ 21 + ^0 1^' and combines tnese values as required. The instruction 
set 55 through 60 performs the necessary coefficient multiplication plus the 
final addition and storage of the expression Q. 

It can be seen that this parallel computation feature is at an optimum 
whenever the four-point PE allocation is used. In the forecast program, this 
feature was utilized only partially since it was desired to use the eight-point 
allocation. This is not a necessary feature but turns out to be a convenient 
one for finite difference schemes more involved than the conventional type. 
17. 2 RELAXATION TECHNIQUES ON THE SOLOMON SYSTEM 

Strictly speaking, the assumption in paragraph 16.7 is not exactly true. As 
it turns out, due to the parallel processing of all Processing Elements (1024 
points at a time), slightly fewer relaxations are required by the SOLOMON II 
System (4 points per PE) than by the sequential type of processing to obtain 
the same degree of convergence. This difference is very small (zero for the 
first case) but becomes more significant as the matrix size is increased. A 
numerical analysis was carried out to investigate this feature of convergence 
as accomplished by the two types of processing. This test consisted of solv- 
ing Laplace's equation (finite difference analogue) for different grid sizes. 
The number of iterations (relaxations) required for convergence by both types 
of processing plus the absolute magnitude of the convergence (maximum dif- 
ference), were then tabulated, and compared to determine which system pro- 
vided the fastest rate of convergence. This test was conducted by a 7094 
program and yielded the following results: (In all cases, a four -point- 
Processing-Element allocation was used. The initial field guess was zero 
and the boundary values were the same for all cases. ). 

No. Iterations No. Iterations 

Grid Size Sequential System SOLOMON II System 

4x4 16 16 

6 x 6 53 47 

8 x 8 92 90 

10 x 10 148 145 

12 x 12 216 211 
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The computer program was instructed to stop if the number of iterations re- 
quired to achieve convergence exceeded 250. Hence for the remaining cases, 
a relative difference in the magnitude of convergence by the two systems was 
all that was available. The remaining data are as follows, based on 250 
iterations: 





Maximum Difference, 


Maximum Difference 


Grid Size 


Sequential System 


SOLOMON II System 


14 x 14 


0.954 x 10" 6 


0.477 x 10" 


16 x 16 


0. 175 x 10" 4 


0. 846 x 10' 5 


18 x 18 


0. 144 x 10" 3 


0. 669 x 10" 4 


20 x 20 


0. 593 x 10" 3 


0.264 x 10" 3 



Note, for the sequential processing, a Liebmann relaxation technique was 
used. 

It must be noted that the above statistics were obtained from a particular 
case with a particular grid point-PE allocation and do not, in general, rep- 
resent exactly the rates of convergence for all types of problems requiring 
relaxation or iteration procedures for solution. It can be seen that for this 
case, however, the SOLOMON II System requires fewer iterations than the 
sequential method of processing to provide a convergent solution. In fact 
the sequential computer should be programmed to take advantage of this 
faster method of relaxation. 

It is also appropriate to discuss the main types of relaxation procedures 
and their implementation by the SOLOMON II System. In general, relaxation 
techniques can be divided into two main schemes: 

a. A procedure by which no new data are used to generate the current 
values (Richardson's Method) 

b. A procedure by which two new data points are used to obtain the 
current values (Liebmann 1 s Method). 

These two methods are best illustrated by an example. Consider the 
Helmholtz equation: 
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2 

V - a</> = F (x, y) 
For the first case, the residual is given by: 

r.. = 0. , . .+<£., . + <£. • , - 4+a 0.. - F.. 

ij i+l, J i-l, J i, J + l i, J-1 ij ij 

m = the order of approximation. 

Here, the residual (and hence the new value of 6.. denoted by + ^ for each 

ij ij 

point is based entirely on old values computed by the previous relaxation. 

For the second case (Ldebmann relaxation) the residual is computed by: 

m m , ,m , . m+1 , ,m+l m _ 

r.. = $ . + 0. .,,+£., . + <£. • n - 4+a) - F.. 
ij v i+l, j l, j + l r i-l, j l, j-1 ij ij 

where now r.. is based on three old values of 6 namely 6^?, (b^ 1 . . , , 6^ 

il , i ,i i+l, J i, J + l ij 

m+1 m+1 . 
and two new ones, <^ ^ j* ^i j 1° a se Pt uen tia-l process (increasing i, j) 

these values are always available at each point and utilization of this new 

data has been shown to produce a faster rate of convergence. The new values 

of 0_ are computed by a predictor-corrector formula given by: 

.m+1 ,m , a _m 

<P.. = 0.. + -r. — R.. 

ij ij 4+a ij 

a = a relaxation coefficient in the range 
0. 2 < a < 2. 

At first glance, it would appear that the SOLOMON II System with its 
parallel processing feature would be forced to use the Richardson relaxation 
method. Upon closer investigation however, it is seen that for more than one 
point per PE allocation, a pseudo-Liebmann relaxation scheme can be gen- 
erated. For example, take the four-point-per-PE case (see figure 17-2) 
computations at the first point (position No. 1) would have no new data avail- 
able from which to work. Hence this first case is very similar to the 

Richardson method. For the second point, (position No. 2), one new datum 

m+ 1 

is available, namely the value just computed at position No. 1, (<f>. .). For 

~ ■*■ » J 

the third point (position No. 3) again the value previously computed at position 
No. 1 is available (now (/^"V, 1 ). Finally, for the fourth point (position No. 4), 
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• 1 


2* 


• 3 


4# 



• I 2* 

SPNI5090-VA-5I 

Figure 17-2. Four- Point- Per-PE Case 



two new values are available, namely the new values at positions 2 and 3 
(0^ n+ .^L 1 > ^^t* .» respectively). (In actuality, the values at positions No. 2 
and No. 1 are also available for positions No. 3 and No. 4, respectively, but 
for this example are not required by the finite difference formula. ) 

It can be seen that generalization of the grid point - PE allocation will 
produce various combinations of available new and old data. It must be 
remembered that the above descriptive operations accounted only for Internal 
PE data and that corresponding new values (for a given position in the PE) 
are also generated concurrently in neighboring PE's. These new values 
may also be available, depending upon the requirements of the finite differ- 
ence scheme being used. Hence it is conceivable that values for a given 
point can be computed from almost entirely new data. This is realized at 
position No. 4 for this example where new values at position 2 and 3 (and 

hence S, 3 and E, 3 are available; d)? 1 ^}, , , (i!* 1 ^ ., d^^} ^f^T* .» respec- 

i, J + 1 i-l, J i, J-l 1+1, J 

tively). For the one -point -per-PE allocation, the above system obviously 
reduces to the Richardson method of relaxation. 
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18. SEQUENTIAL PROBLEMS 



Suppose it is desirable to run a sequential problem or a sequential sub- 
problem. If it is a sequential problem, there is only one logical choice. The 
GPU has the speed capability of approximately twice that of the IBM 7094. 
Thus any sequential problem could be run with greater speed in the GPU than 
in the PE Network. 

However, suppose at some point in a program, it is desired to execute 
a sequential subproblem. For example, assume that it is necessary to find 
the square root of some number or numbers. Now a choice exists. Mainly, 
would the operation be performed faster in the GPU or in the PE Network. 
This decision is left entirely to the programmer. 

If it is necessary to find the square root of one number, the GPU v/ould 
be the logical choice since it would be the faster unit. However, as the num- 
ber of values increase, the GPU loses its advantage. 

Now assume that there are 1000 numbers which must have their square 
root calculated. In this case the PE Network would be the faster choice since 
the calculations are performed in parallel rather than sequentially. 

To perform the desired operations in parallel, a scheme such as follows 
could be used: 

a. Store mode in memory location M 

b. Set all PE's to Mode X (X > 0) 

c. Set PE's applicable to the desired operation to Mode X-l 

d. Program the desired operation addressing all commands to Mode X-l 

e. Load original modes into the PE's from the memory location M 

f. Continue on original program 
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19. WAVE NUMBER SPACE COMPUTATION 



Another possible method of numerical weather forecasting is the integra- 
tion of the dynamic equations in wave number space. This method consists 
of transforming the prediction equations expressed in spatial coordinates 
into a wave number coordinate system leaving the time variable unchanged. 
This transformation results in a system of ordinary differential equations 
with time as the independent variable but whose spatial parts now contain 
interaction terms of various wave number components. The transformation 

is accomplished by the substitution: (m, n = wave numbers) 

CO ffl . , 

. . -V y . , . i (k mx + 1 n y) 

(x, y, t) = L L <Pi (m, n, t) e 

m= n= 1 

for planar harmonics and: 
00 00 

a. t a \ +\ y y a f , . i (k m0 + 1 n X) 

{6, X, t) = L, (m, n, t) e 

m= n= Z 

for spherical harmonics: where each term in the above series expansions 
must satisfy the given boundary conditions and is any dependent variable 
(single parameter model). For planar harmonies, substitution of the above 
expression into the dynamic equation of motion reduces the problem to the 
following general set of ordinary differential equations. 

$.=a. + Xb, * + I c * 

J jk 

where 

0. = the ith dependent variable 
l 

a. = a constant forcing function for each i 

ij = the coefficients of the linear terms 

c. = the coefficients of the nonlinear (advection) terms, 
ljk 
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The above mentioned coefficients are, generally, functions of the spectrum 
of wave numbers only and are of the form: 
b = B (J) 

c = C (J, K) 

ljk 

(J, N) 

J 

= 0(M - J, N - K) 

k 

where M, N equal wave numbers; J, K are indices of wave number spectrum 
under consideration. The series expansion is necessarily truncated to in- 
clude the desired wave components, and still retain computational feasibility. 
To initiate the forecast procedure, the initial values of $ in wave number co- 
ordinate must be specified. 

At first glance, this type of solution appears to lend itself solely to a se- 
quential type of processing. This is due to the fact that a given wave compo- 
nent is capable of interacting with any other wave component within the limits 
of the summation, rather than merely the nearest four or eight neighbors as 
is the case with the conventional finite differencing process. This is equiv- 
alent to saying that derivatives at a given point (wave number) depend upon 
given information in the entire domain rather than just neighboring informa- 
tion. 

However, recalling that the PE Network can be connected into a two dimen- 
sional cylinder (i. e. , connections to ends of rows and columns), a possible 
method for utilizing the network for this type of problem can be outlined. For 
this shape, the same set of communications apply as before, except for this 
case, just the internal arithmetic operations will be considered (i.e. , oper- 
ations within base PE). A possible procedure would be to represent each 
wave number component M, N by a single Processing Element and perform 
the internal operations to form the various required interaction coefficients 
for that particular set of wave number components. The next step would 
then be to shift the original wave component data one Processing Element to 
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the right (i. e. , to east neighbor), then repeat the above computations, leav- 
ing the original data stored in the original base PE. This procedure could 
then be repeated. 32 times until all possible combinations for that particular 
row (and hence all rows in the network) have been completed. The third step 
would be to shift one more PE to the right (made necessary to restore orig- 
inal data to its original location), then shift all data down one row of Process- 
ing Elements (and hence shift data down one row for all PE's). The procedure 
outlined previously is then repeated for this next configuration of wave number 
data until all 32 PE's are again accounted for. The data for this row of cal- 
culations are then shifted down again one row and the computations again 
shifted across the new row. This double shifting procedure is repeated until 
all possible combinations of interaction coefficients have been computed (i. e. , 
entire spectrum of wave numbers is covered). Access from the bottom to top 
rows of Processing Elements and from right- to left-hand columns of Process- 
ing Elements is supplied by connections to and from these components, respec- 
tively. At each shift along a row, 32 x 32 or 1024 calculations per interac- 
tion coefficient are performed. For each completed row, 32 x 32 x 32 or 
32, 768 calculations per coefficient are performed. For the entire spectrum af- 
ter 32 shifts across and 32 shifts down, 32 x 32 x 32 x 32 or 1, 048, 576 calcu- 
lations per coefficient are performed. This method will compute all of the re- 
quired interaction coefficients and store them provided enough storage is avail- 
able. To alleviate the storage problem, it may be possible to perform the 
summation concurrently with the shifting procedure, thereby accumulating 
the necessary interaction coefficients and parameters as required by the series 
expansions. This method has not yet been fully investigated but comes to 
mind as a possible means of employing the PE Network to good advantage for 
this type of problem. An alternate method would obviously be to use the Gen- 
eral Purpose and Network Control Units to perform the required operations 
sequentially. The speed in this latter case would be on a par with twice that 
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of the IBM 7094. A third possibility (and perhaps the most feasible) would 
be to use both the PE Network and the Central Control Units simultaneously; 
the PE Network to compute the interaction coefficients by the method outlined 
previously and the Control Units (General Purpose and Network Units) to 
perform the required summation and bookkeeping. The interaction coeffi- 
cients could be read out of PE memory into the Network Control unit via the 
L-Buffer as fast as they are computed, thereby making them available for 
the summation process to be performed in the General Purpose Unit concur- 
rently. In any case, the final transformation of the solution from wave num- 
ber coordinates back into spatial coordinates can be handled efficiently by the 
network. See figures 19-1, 19-2, and 19-3. 
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Figure 19-1. Diagrams for Network Configuration for Wave 
Space Computation, Initial State 
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Figure 19-2. First Shift to Right 
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Figure 19-3. First Row Completed and First Shift Down 
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20. SUMMARY 



The purpose of this report was to demonstrate the application of the 
SOLOMON II Computer to particular problems. The major emphasis has 
been placed on numerical weather prediction problems. To clarify these 
applications, detailed codes have been supplied, where applicable, to enable 
the reader to gain a fuller understanding of the basic characteristics of the 
SOLOMON II System. The example programs illustrated the concept of writ- 
ing one program to control the NCU and the PE Network. An analogous situa- 
tion for a sequential computer would be the separation of bookkeeping (NCU 
instructions) and arithmetic (PE instructions) operations. 

Mode control provides the programmer with a powerful method of control- 
ling the network. In the relaxation routine, modes were used to control 
boundary points. For the relaxation example the boundary was a simple 
geometric shape (rectangular) and indexing on a sequential computer was 
straightforward. However, as the boundary becomes irregular, indexing 
on a sequential computer becomes much more difficult and, perhaps more 
important, time consuming. As an example of the utility of mode control, 
consider the effect of oceans on atmospheric movements. It has become in- 
creasingly apparent that the oceans are a vast reservoir of heat energy and 
hence play an important part in atmospheric heat transfer (at sea and air 
interface) at various portions of the earth's surface. Consideration of oceano- 
graphic analysis implies incorporation of the continental boundaries into the 
atmospheric models; boundaries which are, in general, extremely irregular 
in nature. To account for these irregular boundaries at the earth's surface, 
an extremely complicated indexing and bookkeeping subroutine would be 
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required to perform this type of computation on a sequential computer. This 
problem can be eliminated completely, with little effort, on the SOLOMON 
II Computer by the use of Mode Control. All points which lie on a continental 
boundary can be set to a different mode, thereby removing once and for all, 
the time consuming procedure of locating these points sequentially. The use 
of Mode Control with respect to variable geometry has already been described 
in the first portion of this report. 

Short range forecasting provides another situation where Mode Control 
can be an extremely useful facility. For this type of problem, there are 
various sets and sub- sets of points to be considered in addition to variable 
boundary conditions (i. e. , edge and points whose values vary with time and 
space). These sets and sub-sets of points can be either on the boundaries 
or in the interior field. For this type of problem, each set or sub- set can 
be assigned a different mode, and the necessary operation performed sepa- 
rately for each set. (The method for obtaining these modes is explained in 
Section 3. ) At first glance, this appears to be an extremely inefficient method 
of handling this type of problem due to the fact that there may only be a few 
points in a given set. However, due to the speed which the SOLOMON II 
System can execute the necessary instructions, all of the necessary sets or 
sub- sets of points can still be handled faster by Mode Control in the Network 
than by a sequentially organized computer. Generally speaking, the treat- 
ment of the variable boundaries varies but can again be handled by Mode 
Control in the same manner as for the oceanographic case. The main use of 
Geometric Control (a second method of network control) is during input/ output. 
These procedures are well explained in the text of the report. 

With regard to the Broadcast Register, it was stated that two options were 
available; to supply constants to the entire Network and to provide a north or 
south routine to adjacent rows of PE's respectively. The coding of the prim- 
itive equation model was written on the premise that both options were avail- 
able. It should be noted that the second option is not a necessity but merely 
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serves to facilitate the programming. Without this option, the averaged 
values would have to be transferred into the nearest row of PE's (either 
north or south rows) in some temporary location before the forecast could 
proceed. Additional programming would now be necessary since two bound- 
aries have been created which will not have available a north or south routing; 
hence some of the features used in the given coding could not be utilized as 
conveniently for this case. Mode Control would then be used to account for 
the boundary rows of PE's. It can be seen, therefore, that this second option 
provides the programmer with two important advantages. 

a. All Processing Elements (including boundary PE's) can execute the 
given set of instructions concurrently. 

b. No special programming is required at the "boundaries" except 
for loading the Broadcast Register with the appropriate value. 

Through the routing facility, each PE has access to 

a. Central memory via the Broadcast Register and L-Buffer. 

b. Any of the memories of the four nearest neighbors. 

c. Its own memory. 

This routing facility is extremely useful in the solution of partial differ- 
ential equations. This is true even in situations where data are required from 
Processing Elements other than the nearest four neighbors, by a given Base 
PE; to illustrate these features, the process of obtaining solutions to a sample 
global circulation model by the method of finite differences has been greatly 
stressed. Extensive research has shown this representation to be an ex- 
tremely powerful method for solving the hydrodynamic equations of the atmos- 
phere on a large scale for an extended period of time. Care must be taken, 
however, to ensure that these finite difference representations be approxi- 
mately conservative in nature. For primitive equation models in general, 
the trend has been toward higher order differencing schemes in order to 
conserve energy (as much as possible) and hence preserve long range stabil- 
ity. It has been briefly illustrated in paragraph 17.1 how the SOLOMON II 
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System can be readily applied to higher order differencing schemes. From 
the discussion on the number of points per PE, it can be seen that this allo- 
cation can be set to any number (Section 1Z). (Four points per PE turns out 
to be an optimum case but this in no way restricts the programmer to this 
distribution.) It has also been shown in the forecast section of the coding 
(paragraph 16. 8) how access is gained to data in "corner" Processing Elements 
with just two additional instructions. Hence the adoption of higher order 
finite differencing schemes will present little or no programming difficulty 
and should give a significant time ratio in favor of the SOLOMON II Computer, 
over a sequentially organized system. 

It has been stated previously that the SOLOMON II System is a fixed word 
length machine. It has also been found that a 20-bit word will ensure suffi- 
cient accuracy for practically all numerical forecasting purposes. If however, 
it is desired to increase the accuracy beyond this limit, a facility for doing 
this is provided in the double precision routine (Section 8) where 40-bit oper- 
ations can be performed in the Network. This double precision routine will, 
however, be roughly three times as slow as a SOLOMON II designed for a 40- 
bit-word length. 

It should be noted that the forecasting models now being developed will 

place demands on a computing system that cannot be adequately satisfied by 

sequentially organized computers. For example, the model proposed by 

9 

Philips* requires 9 x 10 multiplications for the computation alone, requiring 
5. 5 hours of 7090 time to generate a 24-hour forecast. Long range forecasts 
may go up to 96 hours, while experimental atmospheric circulation forecasts 
could range up to 200 days or more. The bookkeeping instructions (checking 
for overflow, scaling in a fixed point routine, input/output, etc) may increase 
the execution time by a factor of 2 or 3. Also this model requires roughly 
270, 000 words of data, implying that the complete data set cannot be stored 
in the core memory and must be retrieved and restored for each time step. 

* N. A. Philips, "Advances in Computers, " Numerical Weather Prediction, 
Academic Press, New York, I960, pp. 43-86. 



15090A 



20-4 



Computer and Data Systems 




In atmospheric circulation computations, it is generally required that the 
data for two time steps (i. e. , data at time t-1, and time t) be retained for 
computation of the forecast variables at time t+1* In addition to this, a 
working storage must be provided for the necessary operations to obtain these 
required new values. In the SOLOMON II System, the storage available per 
Processing Element is, for a 20-bit word, 2000 words in a 40, 000-bit mem- 
ory. The total PE array storage is 1024 x 2000 or 2, 048, 000 words; 1024 x 
40, 000 or 40, 960, 000 bits. These data are in the random access core mem- 
ory. The storage per PE can be increased to 8, 000 words in a 160, 000-bit 
memory or a total Network storage of 8, 192, 000 words (163, 840, 000 bits). 
To augment the PE core storage, the SOLOMON II System will have a disk 
storage system. Up to 16 disks are available and each disk can store 
53, 760, 000 bits or 53, 760, 000/20 = 2, 688, 000 words. These disks are con- 
trolled by a data channel and communicate with the PE's via the L- Buffer . 
Data rates are available in SOLOMON Project Technical Memorandum No. 25, 
"The SOLOMON II Physical Characteristics." 

For many general circulation models, this Network storage will be suffi- 
cient to store most (if not all) of the required levels of data at one time, thereby 
reducing considerably (if not completely) transferring of data from disk into 
the Network and back again due to storage limitations . For example, take 
the case of a 9-level model containing 10, 000 grid points per level with 7 pa- 
rameters to be carried at each point. The total apparent storage would then 
be 10, 000 x 9 x 7 or 630, 000 words. But this must be available for three 
time steps at a time (i.e., time steps t-1, t, and working storage for time 
t+1) which gives a total storage requirement of 630, 000 x 3 or 1, 890, 000 
words. Since the PE Network has 2, 048, 000 words of storage, it can be 
seen that even for this large storage requirement, the problem could con- 
ceivably be stored completely in the Network (3 time steps at a time) and 
run without unnecessary data transfer each time step. This feature would 
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also provide faster and more complete forms of output since the entire level 
of data would be available at a given time. This is particularly desirable 
when requesting output on a global basis. 

The associative memory property of SOLOMON II provides an extremely 
fast method of table look-up. Since a major portion of compiling time is 
consumed in table look-up, the table look-up (and large storage) should pro- 
vide a reduction in compiling times. 

As was stated in paragraph 16. 8, the time estimate for coding of the 
primitive equation model was subdivided into the following three categories - 
computation time for initial guess field and initial data; computation time for 
one iteration of the relaxation procedure; and computation time for one time 
step in the forecast equations (to include averaging procedure at poles). 
From the results obtained from the coding, the following table can be con- 
structed: 



TABLE 20-1 
TIME ESTIMATES 



Computation 


Time (Minoseconds) 


Wind field 


198.0 


F. . 


451.0 






Total 


649.0 


One relaxation for initialization 


1070.0 


Position No. 3 (per time step) 


167.0 


i 

U Position No. 2 (per time step) 


149.6 


i 

V Position No. 5 (per time step) 


176. 2 


Position No. 6 (per time step) 


164. 8 


U* Position No. 4 (per time step) 


147.6 


V* Position No. 1 (per time step) 


171.6 


Position No. 8 (per time step) 


169.2 


(f) Position No. 1 (per time step) 


171.4 


V Position No. 2 (per time step) 


171.6 


V Position No. 4 (per time step) 


165.4 
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TABLE 20-1 (Continued) 



Computation 


Time (Minoseconds) 


U Position No. 5 (per time step) 


147. 8 


U Position No. 7 (per time step) 


147. 6 


Averaging procedure (per time step) 


290. 


Total 


2239. 8 



From the table given above, the total time estimates for the three respec- 
tive sections are: computation of wind field and F. . - 649 microseconds; 
one iteration of relaxation procedure - 1070 microseconds; one time step in 
forecast equations - 2239. 8 microseconds. 



The results from the Baratropic Model are: 
Baratropic Model - 804 microseconds. 

The time ratio 2000:1 obtained for the SOLOMON II System compared to 
the IBM 7090 with respect to the Baratropic Model is extremely large. This 
is an indication of the speed advantages which can be obtained on the parallel 
organized SOLOMON II System, over a sequentially organized computer (7090) 
for the idealized case considered. 

A running time estimate was made on the primitive equation model pre- 
sented in this paper for the SOLOMON II System. Since a corresponding 
coding for a sequential computer is not yet available, a direct comparison 
between the two systems is not possible at the present time. It is felt, 
however, that when this is done, the result will show the SOLOMON II 
Computer to be substantially faster (by more than an order of magnitude) 
than the fastest sequentially organized computer currently on the market. 
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